A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences

نویسندگان

  • Xing Li
  • Chengqing Zong
  • Rile Hu
چکیده

(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China) Abstract: Based on the analysis of the usage and the syntactic function of Chinese punctuations, this paper proposes a new hierarchical approach to parsing the long Chinese sentences. In traditional parsing approaches, the parsing procedure is performed on one-level and the punctuation marks are not specially treated. Correspondingly, in our approach the complex long Chinese sentences are broken into sub-sentences or units (say ‘units’ hereafter) by using of the punctuation marks with special functions, so that the original whole sentence is parsed unit by unit. This idea of ‘dividing-and-ruling’ greatly reduces the difficulty in the traditional parsing approaches to recognize the syntactic relationship between the sub-sentences and phrases or inside the sub-sentences or phrases. And also, in our approach the grammatical rules with punctuation marks and their probabilities are extracted from the large scale Treebank, which are very beneficial for the syntactic disambiguation. Our experimental results have shown that comparing with the traditional Chart parsing algorithm, our approach can significantly reduce the time consumption and the numbers of ambiguous edges, and about 7% of the correct rate and the recall rate have been increased in parsing the long Chinese sentences .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation

Francisco Oliveira, Fai Wong and Iok-Sai Hong. Systematic Processing of Long Sentences in Rule based Portuguese-Chinese Machine Translation The translation quality and parsing efficiency are often disappointed when Rule based Machine Translation systems deal with long sentences. Due to the complicated syntactic structure of the language, many ambiguous parse trees can be generated during the tr...

متن کامل

Segmentation of Chinese Long Sentences Using Commas

The comma is the most common form of punctuation. As such, it may have the greatest effect on the syntactic analysis of a sentence. As an isolate language, Chinese sentences have fewer cues for parsing. The clues for segmentation of a long Chinese sentence are even fewer. However, the average frequency of comma usage in Chinese is higher than other languages. The comma plays an important role i...

متن کامل

Dependency parsing for Chinese long sentence: A second-stage main structure parsing method

This paper explores the problem of parsing Chinese long sentences. Inspired by human sentence processing, a second-stage parsing method, referred as main structure parsing in this paper, are proposed to improve the parsing performance as well as maintaining its high accuracy and efficiency on Chinese long sentences. Three different methods have attempted in this paper and the result shows that ...

متن کامل

Towards a Syntactic Account of Punctuation

Little notice has been taken of punctuation in the field of natural language processing, chiefly due to the lack of any coherent theory on which to base implementations. Some work has been carried out concerning punctuation and parsing, but much of it seems to have been rather ad-hoc and performance-motivated. This paper describes the first step towards the construction of a theoretically-motiv...

متن کامل

Commas and Spaces: The Point of Punctuation

While it has been widely assumed that punctuation may play a critical role in parsing, there has been relatively little direct empirical investigation of its effects. Most researchers have either avoided the use of punctuation or have simply assumed that it will serve a disambiguating role. There has been little or no consideration of how ’disambiguation’ might occur or whether it is equally ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005